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Introduction 

This  project  centers  on  creating  a  molecular  framework  of  DCIS  (ductal  carcinoma  in  situ).  DCIS  is 
considered  to  be  the  precursor  to  Invasive  Ductal  Carcinoma  (IDC),  the  most  common  form  of  breast 
cancer.  IDC  accounts  for  80%  of  all  breast  cancers,  predominantly  affecting  women  aged  55  and  older; 
however,  at  least  a  third  of  women  with  IDC  are  diagnosed  before  they  reach  55. 

Utilizing  a  unique  bank  of  frozen  mammary  biopsies,  containing  samples  with  DCIS  alone,  and  a 
combination  of  DCIS  and  IDC,  we  aim  to  profile  both  DCIS  and  related  tissue  components.  It  is  our  aim 
to  sample  the  ~300  biopsies,  and  compare  both  by  RNA  seq,  and  whole  genome  amplification,  DCIS 
lesions,  within,  and  between  patients,  and  see  how  these  may  be  correlated  with  IDC  lesions.  We  also 
intend  to  look  for  changes  in  the  stroma  between  those  patients  that  present  with  IDC  and  those  that  do 
not.  This  work  aims  to  identify  characteristics  that  may  be  suggestive  of  a  patients'  likelihood  of 
progressing  from  DCIS  to  IDC,  with  the  purpose  of  reducing  the  need  for  over  treatment  for  this  disease. 
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3. 

Accomplishments 

Aim  1.  The  evolution  of  DCIS. 

Task  1.  Sample  collection  and  annotation 
Task  2.  Sample  choice  from  frozen  bank. 

We  have  received  80  samples  from  the  frozen  bank  now  and  have  processed  70  of  these  so  far.  These  include  pure 
DCIS  and  also  mixed  DCIS  and  IDC  samples.  We  selected  samples  that  had  5  or  more  DCIS  legions  for  this  Aim  as 
these  will  be  more  informative  for  looking  at  the  evolution  of  DCIS. 

Task  3.  Laser  capture  of  frozen  samples  for  characterization 

From  each  of  the  70  samples  we  have  dissected  material  for  DNA,  however  we  have  material  from  18  patients  for 
characterization  (based  on  having  5  or  more  DCIS  legions).  We  have  selected  DCIS  legions,  IDC  regions,  normal 
epithelium,  where  present,  Atypical  epithelium,  Solid  DCIS,  papillary  DCIS,  benign  epithelium,  and  stroma  (as  far 
away  from  DCIS  or  IDC  regions  as  possible).  The  table  below  represents  the  distribution  across  patients,  with  a 
total  of  214  legions  including  the  normal  and  variants  of  epithelium.  . 


Number  of  DCIS  legions 

Number  of  samples 

Number  of  samples 
with  IDC 

Number  of  IDC  legions 
per  sample 

5 

4 

2 

4,4 

6 

7 

3 

8,2,4 

7 

6 

3 

5,7,5 

8 

1 

0 

9 

1 

1 

5 

10 

1 

1 

6 

11 

0 

0 

12 

0 

0 

13 

1 

0 

Total 

144 

10 

50 

Task  4.  Exome  capture  and  sequencing 


We  initiated  work  on  this  using  the  Nextera  Exome  Capture  kit,  however  on  the  couple  of  samples  we  used,  this  did 
not  prove  successful,  as  there  was  a  very  low  distribution  of  probes  represented.  Having  investigated  the  costs  and 
what  is  needed  to  get  deep  enough  coverage  for  accurately  calling  CNVs  and  SNVs,  we  decided  to  make  use  of  the 
X10  sequencing  machine  at  the  NYGC  and  do  whole  genome  sequencing  instead.  A  trial  run  with  this  demonstrated 
that  the  Whole  Genome  Sequencing  kit  that  we  were  using  (and  other  kits  on  the  market)  was  only  compatible  with 
sequencing  machines  after  the  DNA  had  been  sheared  (resulting  in  removal  of  end  primers).  This  was  not  efficient 
with  sequencing  on  the  X10,  as  reads  are  generally  longer  and  shearing  would  result  in  very  short  reads.  We 
therefore  established  a  new  protocol  based  on  old  school  molecular  biology,  where  by  we  literally  chewed  the 
primers  off  the  ends  of  DNA  strands  after  amplification  with  the  WGA  kit,  this  then  allowed  us  to  attach  the  primers 
for  sequencing  (this  was  somehow  hindered  without  removal  of  the  WGA  primers).  This  pipeline  proved  very 
effective  and  all  214  samples  have  now  just  finished  being  sequenced  on  the  X10. 


Task  5.  Analyze  Exome  capture  data 

Data  from  the  X10  sequencing  is  still  being  processed  however  concordance  analysis  has  been  carried  out  on  over 
half  of  the  samples.  This  looks  for  any  discrepancies  between  a  “normal”  sample  and  its  paired  “tumor”  sample. 

Pairs  generally  have  over  90%  concordance,  however  this  analysis  has  proved  useful  as  it  identified  a  misread  tube 
label  and  thus  allows  us  to  correct  such  errors. 

Initial  analysis  on  CNV  data  has  been  carried  out  on  4  patients  thus  far  and  shows  that  there  are  differences  to  be 
seen  between  DCIS  legions  within  the  same  patient.  Further,  more  indepth  analysis  will  be  carried  out  on  the 
phylogeny  of  the  DCIS  and  IDC  legions  and  if  there  are  any  associations  between  the  differences  we  see  in  the  DNA 
data  and  the  differences  we  see  in  the  RNA  data.  This  is  being  carried  out  together  with  the  bioinformaticians  at  the 
NYGC  and  we  will  seek  further  analysis  from  groups  here  at  Cambridge  who  specialize  in  tumor  evolution. 


Aim  2.  A  transcriptional  landscape  of  early  breast  cancer. 


Task  6.  Sample  choice  from  frozen  bank. 

-  choose  samples  for  pure  DCIS  and  DCIS  with  microinvasion/IDC 
Task  7.  Laser  capture  of  frozen  samples  for  characterization 

We  have  received  80  samples  from  the  frozen  bank  now  and  have  processed  70  of  these  so  far.  For  each  sample  the 
following  regions  are  annotated  by  Joe  (the  pathologist)  and  dissected  in  triplicate  for  RNA:  DCIS,  IDC,  normal 
epithelium.  Atypical  epithelium,  Solid  DCIS,  papillary  DCIS,  benign  epithelium,  areas  of  high  immune  infiltration, 
stroma  adjacent  to  DCIS,  stroma  adjacent  to  IDC  and  stroma  away  (as  far  away  from  DCIS  or  IDC  regions  as 
possible).  This  has  provided  over  3000  legions.  This  Task  is  still  on  going. 

Task  8.  RNAseq  library  construction 

Approximately  800  RNA  seq  libraries  have  been  sequenced.  Currently  we  have  been  focusing  on  DCIS  and  IDC 
and  normal  and  other  epithelium  and  are  prioritizing  these  for  sequencing  now.  This  task  is  still  on  going. 


Task  9.  Analyze  RNAseq  datasets 

Thus  far  we  have  analyzed  318  DCIS,  95  IDC,  69  benign/normal  epithelium  and  91  stroma  away  libraries.  For 
quality  control  samples  with  a  Gene  Assignment  of  <  15%  with  %  of  Uniquely  mapped  reads  <  20,  are  removed 
from  the  group  analysis.  This  has  resulted  in  122  libraries  being  removed.  1.5%  DCIS,  2.1%  IDC,  2.6%  epithelium 
and  3.8%  stroma  away. 


Pearson  correlations  between  replicates 
Filter  genes  (both  replicates  having  more  than  5  in  expression) 
Normalized  counts  vst  transformed 


1.0 


0.8 


0.6 


CM 


CL 


0.4 


0.2 


0.0 


DCIS 


Epi 


Tissue  Type 


IDC 


Stroma  Away 


Opportunities  for  training  and  professional  development 

Nothing  to  report  (not  intended  for  training) 

Results  disseminated  to  communities  of  interest 

Nothing  to  report 


4. 

Impact 

Nothing  to  report 


5. 

Changes  /  problems 


Nothing  to  report 


6. 

Products 

Nothing  to  report 


7. 

Participants  &  other  collaborating  organizations 
Individuals  worked  on  the  project 

Name  :  Greg  Hannon 

Project  Role:  Initiating  PI  -  contributed  to  project  design  and  liaising  with  bioinformatics  team 
Nearest  person  month  worked:  1  CM  (10%  x  13  months) 

Funding  support:  CR-UK  and  Royal  society 

Name  :  Clare  Rebbeck 

Project  Role:  Co-PI  -  contributed  to  project  design,  staining  strategy,  dissecting  with  the  LCM,  RNA  and 
DNA  library  preparation  and  liaising  with  Bioinformatics  team  and  pathologist. 

Nearest  person  month  worked:  13  CM 

Name  :  Jian  Xian 

Project  Role:  senior  research  assistant  -  contributed  to  dissecting  with  the  LCM  and  RNA  and  DNA  library 

preparation 

Nearest  person  month  worked:  12  CM 

Funding  support:  CR-U 

Name  :  Laurence  de  Torrente 

Project  Role:  Bio  informaticitian  -  contributed  to  data  processing  and  data  analysis 

Nearest  person  month  worked:  12  CM 

Name  :  Martin  Fabry 

Project  Role:  student  -  contributed  to  DNA  library  preparation 
Nearest  person  month  worked:  12  CM 

Name  :  Sophie  Watcham 

Project  Role:  research  technician  -  contributed  to  DNA  library  preparation 
Nearest  person  month  worked:  3  CM 


Change  in  active  support  since  last  report 

Nothing  to  report  (this  is  the  first  reporting  period) 

Other  organizations  involved  as  partners 

Duke  university  -  collaboration  to  provide  tissue  samples  and  clinical  annotation;  as  detailed  in  the  grant 
application. 

New  York  Genome  Center  -  Collaboration  with  the  bioinformatics  team  to  analysis  the  data;  As  detailed 
in  the  grant  application 


